poisson (Poisson distribution)#
The Poisson distribution models the number of events that occur in a fixed exposure (time, area, volume, etc.) when events happen independently at a constant average rate.
This notebook uses the same parameterization as scipy.stats.poisson:
mu(often written (\lambda)) = expected count in the exposure, (\mu \ge 0)
Learning goals#
By the end you should be able to:
recognize when a Poisson model is appropriate (and when it isn’t)
write down the PMF/CDF and key properties
derive the mean, variance, and likelihood / MLE
implement Poisson sampling using NumPy only
visualize PMF/CDF and validate with Monte Carlo simulation
use
scipy.stats.poissonfor computation and basic estimation workflows
Table of contents#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import math
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
1) Title & Classification#
Name: poisson (Poisson distribution)
Type: Discrete
Support: (k \in {0, 1, 2, \dots})
Parameter space: (\lambda \in [0, \infty)) (called mu in SciPy)
A useful interpretation is (\lambda = r \times \text{exposure}), where (r) is an event rate per unit exposure (e.g., “calls per hour”).
2) Intuition & Motivation#
What this distribution models#
A Poisson random variable (X) counts how many events occur in a fixed exposure when:
events occur one-at-a-time,
the event rate is approximately constant over the exposure,
counts in disjoint sub-intervals are independent (independent increments).
A canonical construction is the Poisson process: if events arrive at rate (r) per unit time, then the number of arrivals in an interval of length (T) is [ X \sim \text{Poisson}(\lambda), \qquad \lambda = rT. ]
Typical real-world use cases#
call arrivals to a helpdesk per minute
defects per meter of manufactured material
photons counted by a sensor in a fixed time window
insurance claims per policy-year
mutations in a stretch of DNA
web requests to an endpoint per second
In applied modeling, (\lambda) often depends on covariates (Poisson regression / GLMs) and on exposure (an offset).
Relations to other distributions#
Binomial limit (rare events): if (X_n \sim \text{Bin}(n, p)) with large (n), small (p), and (np \to \lambda), then (X_n \Rightarrow \text{Poisson}(\lambda)).
Exponential / Gamma waiting times: in a Poisson process with rate (r), inter-arrival times are (\text{Exp}(r)), and the waiting time to the (k)-th event is (\text{Gamma}(k, r)).
Additivity: if (X_i \sim \text{Poisson}(\lambda_i)) are independent, then (\sum_i X_i \sim \text{Poisson}(\sum_i \lambda_i)).
Thinning: if you keep each event independently with probability (p), the kept count is (\text{Poisson}(p\lambda)).
Gamma–Poisson mixture: if (\lambda) is random with a Gamma distribution, the marginal count is Negative Binomial (useful for over-dispersion).
Normal approximation: for large (\lambda), (\text{Poisson}(\lambda)) is close to (\mathcal{N}(\lambda, \lambda)) with a continuity correction.
3) Formal Definition#
PMF#
For (\lambda \ge 0) and (k \in {0,1,2,\dots}), the probability mass function (PMF) is [ \Pr(X = k \mid \lambda) = \frac{e^{-\lambda} \lambda^k}{k!}. ]
CDF#
The cumulative distribution function (CDF) is [ F(k;\lambda) = \Pr(X \le k) = e^{-\lambda} \sum_{j=0}^{k} \frac{\lambda^j}{j!}. ]
Using the upper incomplete gamma function (\Gamma(s, x)), we can write
[
F(k;\lambda) = \frac{\Gamma(k+1, \lambda)}{\Gamma(k+1)} = Q(k+1, \lambda),
]
where (Q) is the regularized upper incomplete gamma function (implemented in SciPy as scipy.special.gammaincc).
4) Moments & Properties#
Mean and variance#
[ \mathbb{E}[X] = \lambda, \qquad \mathrm{Var}(X) = \lambda. ] A key modeling implication is equidispersion: mean equals variance.
Skewness and kurtosis#
For (\lambda > 0): [ \text{skewness} = \frac{1}{\sqrt{\lambda}}, \qquad \text{excess kurtosis} = \frac{1}{\lambda} \quad(\text{kurtosis} = 3 + \tfrac{1}{\lambda}). ]
MGF and characteristic function#
The moment generating function (MGF) and characteristic function are [ M_X(t) = \mathbb{E}[e^{tX}] = \exp\big(\lambda(e^t - 1)\big), \qquad \varphi_X(t) = \mathbb{E}[e^{itX}] = \exp\big(\lambda(e^{it} - 1)\big). ]
Entropy#
There is no simple closed form for the Shannon entropy. One convenient expression (in nats) is [ H(X) = -\sum_{k=0}^{\infty} p(k),\log p(k), \qquad p(k)=\frac{e^{-\lambda}\lambda^k}{k!}. ] For large (\lambda), a useful approximation is [ H(X) \approx \tfrac{1}{2}\log(2\pi e\lambda) - \frac{1}{12\lambda} + \mathcal{O}(\lambda^{-2}). ]
Other useful properties#
Mode: (\lfloor\lambda\rfloor) is a mode; if (\lambda) is an integer, both (\lambda-1) and (\lambda) are modes.
Factorial moments: (\mathbb{E}[X(X-1)\cdots(X-m+1)] = \lambda^m).
Closure under sums: sums of independent Poisson variables remain Poisson.
def _validate_mu(mu):
if isinstance(mu, bool):
raise TypeError("mu must be a real number, not bool")
mu = float(mu)
if mu < 0:
raise ValueError("mu must be >= 0")
return mu
def poisson_pmf_array(mu, *, tail=1e-12, max_k=None):
'''Return (ks, pmf) over k=0..K where CDF is ~1-tail.
Notes:
- This is a *truncated* representation of an infinite-support distribution.
- It is accurate when the remaining tail mass beyond K is negligible.
'''
mu = _validate_mu(mu)
if mu == 0.0:
return np.array([0], dtype=int), np.array([1.0], dtype=float)
if max_k is None:
# Heuristic upper bound: mean + several std devs.
max_k = int(np.ceil(mu + 12.0 * np.sqrt(mu + 1.0) + 10.0))
pmf = []
p0 = math.exp(-mu) # underflows for extremely large mu
pmf.append(p0)
cdf = p0
k = 0
while cdf < 1.0 - tail and k < max_k:
k += 1
pmf.append(pmf[-1] * mu / k)
cdf += pmf[-1]
ks = np.arange(len(pmf), dtype=int)
pmf = np.asarray(pmf, dtype=float)
return ks, pmf
def poisson_logpmf(k, mu):
'''Log PMF for Poisson(mu). Returns -inf outside support.'''
mu = _validate_mu(mu)
k_arr = np.asarray(k)
out = np.full(k_arr.shape, -np.inf, dtype=float)
k_int = k_arr.astype(int)
valid = (k_arr == k_int) & (k_int >= 0)
if not np.any(valid):
return out
if mu == 0.0:
out[valid & (k_int == 0)] = 0.0
return out
kv = k_int[valid]
# log(k!) via log-gamma: log(k!) = lgamma(k+1)
log_fact = np.vectorize(lambda x: math.lgamma(x + 1.0), otypes=[float])(kv)
out[valid] = -mu + kv * np.log(mu) - log_fact
return out
def poisson_pmf(k, mu):
return np.exp(poisson_logpmf(k, mu))
def poisson_moments(mu):
mu = _validate_mu(mu)
if mu == 0.0:
return {
"mean": 0.0,
"var": 0.0,
"skewness": float("nan"),
"excess_kurtosis": float("nan"),
"kurtosis": float("nan"),
}
return {
"mean": mu,
"var": mu,
"skewness": 1.0 / math.sqrt(mu),
"excess_kurtosis": 1.0 / mu,
"kurtosis": 3.0 + 1.0 / mu,
}
def poisson_entropy_trunc(mu, *, tail=1e-12):
'''Approximate entropy (nats) by truncating the PMF.'''
ks, pmf = poisson_pmf_array(mu, tail=tail)
pmf = pmf[pmf > 0]
return float(-(pmf * np.log(pmf)).sum())
def poisson_entropy_asymptotic(mu):
'''Large-mu approximation for entropy (nats).'''
mu = _validate_mu(mu)
if mu == 0.0:
return 0.0
return float(0.5 * np.log(2.0 * np.pi * np.e * mu) - 1.0 / (12.0 * mu))
mu = 6.0
moments = poisson_moments(mu)
{
**moments,
"entropy_trunc_nats": poisson_entropy_trunc(mu),
"entropy_asymptotic_nats": poisson_entropy_asymptotic(mu),
}
{'mean': 6.0,
'var': 6.0,
'skewness': 0.4082482904638631,
'excess_kurtosis': 0.16666666666666666,
'kurtosis': 3.1666666666666665,
'entropy_trunc_nats': 2.299299563148505,
'entropy_asymptotic_nats': 2.3009293789298115}
# Monte Carlo check (matches formulas up to sampling error)
mu = 8.0
samples = rng.poisson(lam=mu, size=200_000)
est_mean = samples.mean()
est_var = samples.var(ddof=0)
{
"formula_mean": mu,
"mc_mean": float(est_mean),
"formula_var": mu,
"mc_var": float(est_var),
}
{'formula_mean': 8.0,
'mc_mean': 8.00111,
'formula_var': 8.0,
'mc_var': 7.9929587679}
5) Parameter Interpretation#
The single parameter (\lambda) (SciPy: mu) is both:
the mean number of events in the exposure
the variance of the count
If (\lambda = rT) comes from a Poisson process, then:
(r) is a rate (“events per unit exposure”)
(T) is the exposure (“how long / how big the window is”)
Shape changes#
Small (\lambda): most mass is at 0 and 1, strongly right-skewed.
Moderate (\lambda): the distribution spreads out; the mode moves right.
Large (\lambda): the distribution becomes approximately symmetric and close to (\mathcal{N}(\lambda,\lambda)).
mu_values = [0.5, 1.5, 4.0, 10.0]
mu_max = max(mu_values)
ks, _ = poisson_pmf_array(mu_max, tail=1e-12)
fig = go.Figure()
for mu in mu_values:
fig.add_trace(
go.Scatter(
x=ks,
y=poisson_pmf(ks, mu),
mode="markers+lines",
name=f"mu={mu}",
)
)
fig.update_layout(
title="Poisson PMF for different mu",
xaxis_title="k",
yaxis_title="P(X=k)",
)
fig.show()
6) Derivations#
Expectation#
Starting from the PMF, [ \mathbb{E}[X] = \sum_{k=0}^\infty k,\frac{e^{-\lambda}\lambda^k}{k!}. ] Use the identity (k\lambda^k/k! = \lambda,\lambda^{k-1}/(k-1)!) to shift the sum: [ \mathbb{E}[X] = e^{-\lambda}\sum_{k=1}^\infty k\frac{\lambda^k}{k!} = \lambda e^{-\lambda}\sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda}\sum_{j=0}^\infty \frac{\lambda^{j}}{j!} = \lambda. ]
Variance#
A standard route is via factorial moments: [ \mathbb{E}[X(X-1)] = \sum_{k=0}^\infty k(k-1),\frac{e^{-\lambda}\lambda^k}{k!}. ] Since (k(k-1)\lambda^k/k! = \lambda^2,\lambda^{k-2}/(k-2)!), the same shift gives (\mathbb{E}[X(X-1)] = \lambda^2). Then [ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \big(\mathbb{E}[X(X-1)] + \mathbb{E}[X]\big) - \lambda^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda. ]
Likelihood (i.i.d. sample)#
If (x_1,\dots,x_n) are i.i.d. (\text{Poisson}(\lambda)), the likelihood is [ L(\lambda) = \prod_{i=1}^n \frac{e^{-\lambda}\lambda^{x_i}}{x_i!}. ] The log-likelihood is [ \ell(\lambda) = \sum_{i=1}^n \big(x_i\log\lambda - \lambda - \log(x_i!)\big). ] Differentiate and set to zero: [ \ell’(\lambda) = \frac{\sum_i x_i}{\lambda} - n = 0 \quad\Rightarrow\quad \hat\lambda_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^n x_i. ] So the MLE is the sample mean.
# Visualize the log-likelihood for lambda (single observation)
k_obs = 7
lam_grid = np.linspace(1e-6, 20.0, 600)
logL = k_obs * np.log(lam_grid) - lam_grid - math.lgamma(k_obs + 1)
lam_hat = k_obs
fig = go.Figure()
fig.add_trace(go.Scatter(x=lam_grid, y=logL, mode="lines", name="log-likelihood"))
fig.add_vline(
x=lam_hat,
line_dash="dash",
line_color="black",
annotation_text=f"MLE λ̂={lam_hat}",
)
fig.update_layout(title=f"Poisson log-likelihood (k={k_obs})", xaxis_title="λ", yaxis_title="ℓ(λ)")
fig.show()
7) Sampling & Simulation#
Below are two NumPy-only samplers that illustrate common ideas.
A) Knuth’s algorithm (product of uniforms)#
A Poisson process perspective: the number of events in time (T) with rate (r) is Poisson with (\lambda=rT). Knuth’s algorithm draws uniforms and multiplies them until the product drops below (e^{-\lambda}). The loop runs about (\lambda) iterations on average, so it is excellent for small (\lambda) and slow for large (\lambda).
B) Inverse CDF sampling (with truncated tail)#
If we can compute the CDF (F(k)) on (k=0,1,2,\dots,K) such that (F(K)\approx 1), then sampling is:
draw (U\sim\text{Uniform}(0,1))
return the smallest (k) such that (F(k)\ge U) (
searchsorted)
This is exact up to the (tiny) omitted tail mass beyond (K).
def sample_poisson_knuth(mu, size=1, *, rng: np.random.Generator):
mu = _validate_mu(mu)
size = (size,) if isinstance(size, int) else tuple(size)
out = np.empty(size, dtype=int)
if mu == 0.0:
out.fill(0)
return out
L = math.exp(-mu)
for idx in np.ndindex(out.shape):
k = 0
p = 1.0
while p > L:
k += 1
p *= rng.random()
out[idx] = k - 1
return out
def sample_poisson_inverse_cdf(mu, size=1, *, rng: np.random.Generator, tail=1e-12):
mu = _validate_mu(mu)
size = (size,) if isinstance(size, int) else tuple(size)
if mu == 0.0:
return np.zeros(size, dtype=int)
ks, pmf = poisson_pmf_array(mu, tail=tail)
cdf = np.cumsum(pmf)
cdf[-1] = 1.0 # absorb the omitted tail mass
u = rng.random(size)
idx = np.searchsorted(cdf, u, side="left")
return ks[idx]
mu = 5.0
size = 50_000
x_knuth = sample_poisson_knuth(mu, size=size, rng=rng)
x_inv = sample_poisson_inverse_cdf(mu, size=size, rng=rng)
{
"knuth_mean": float(x_knuth.mean()),
"inv_cdf_mean": float(x_inv.mean()),
"theory_mean": mu,
"knuth_var": float(x_knuth.var(ddof=0)),
"inv_cdf_var": float(x_inv.var(ddof=0)),
"theory_var": mu,
}
{'knuth_mean': 5.00228,
'inv_cdf_mean': 5.00994,
'theory_mean': 5.0,
'knuth_var': 4.999994801600001,
'inv_cdf_var': 4.968961196399998,
'theory_var': 5.0}
8) Visualization#
We’ll visualize:
the PMF (k \mapsto \Pr(X=k))
the CDF (k \mapsto \Pr(X\le k))
a Monte Carlo histogram compared to the theoretical PMF
mu = 7.0
ks, pmf = poisson_pmf_array(mu, tail=1e-12)
cdf = np.cumsum(pmf)
fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(title=f"Poisson PMF (mu={mu})", xaxis_title="k", yaxis_title="P(X=k)")
fig_pmf.show()
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(title=f"Poisson CDF (mu={mu})", xaxis_title="k", yaxis_title="P(X≤k)")
fig_cdf.show()
# Monte Carlo vs PMF
mc = sample_poisson_inverse_cdf(mu, size=80_000, rng=rng)
hist = np.bincount(mc, minlength=len(ks)) / len(mc)
fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=hist[: len(ks)], name="MC histogram", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="PMF"))
fig_mc.update_layout(
title=f"Monte Carlo vs PMF (mu={mu})",
xaxis_title="k",
yaxis_title="probability",
)
fig_mc.show()
Approximations in action#
Two classic approximations are worth seeing:
Binomial (\to) Poisson when events are rare (large (n), small (p), (np\approx\lambda))
Normal (\approx) Poisson when (\lambda) is large
def binom_pmf(k, n, p):
k_arr = np.asarray(k)
k_int = k_arr.astype(int)
out = np.zeros_like(k_arr, dtype=float)
valid = (k_int == k_arr) & (k_int >= 0) & (k_int <= n)
if not np.any(valid):
return out
kv = k_int[valid]
log_coeff = (
math.lgamma(n + 1)
- np.vectorize(math.lgamma)(kv + 1)
- np.vectorize(math.lgamma)(n - kv + 1)
)
log_pmf = log_coeff + kv * math.log(p) + (n - kv) * math.log1p(-p)
out[valid] = np.exp(log_pmf)
return out
# A) Binomial -> Poisson
n = 300
p = 0.02
lam = n * p
ks = np.arange(0, 25)
pmf_binom = binom_pmf(ks, n, p)
pmf_pois = poisson_pmf(ks, lam)
fig = go.Figure()
fig.add_trace(go.Scatter(x=ks, y=pmf_binom, mode="markers+lines", name=f"Bin(n={n}, p={p})"))
fig.add_trace(go.Scatter(x=ks, y=pmf_pois, mode="markers+lines", name=f"Poisson(mu=np={lam:.1f})"))
fig.update_layout(title="Rare-event limit: Binomial vs Poisson", xaxis_title="k", yaxis_title="PMF")
fig.show()
# B) Normal approximation (continuity-corrected)
lam = 40.0
ks, pmf = poisson_pmf_array(lam, tail=1e-12)
# Approximate P(X=k) ≈ Φ((k+0.5-λ)/sqrt(λ)) - Φ((k-0.5-λ)/sqrt(λ))
from math import erf, sqrt
def std_norm_cdf(z):
return 0.5 * (1.0 + erf(z / sqrt(2.0)))
sigma = math.sqrt(lam)
normal_approx = np.array(
[
std_norm_cdf((k + 0.5 - lam) / sigma) - std_norm_cdf((k - 0.5 - lam) / sigma)
for k in ks
],
dtype=float,
)
fig = go.Figure()
fig.add_trace(go.Scatter(x=ks, y=pmf, mode="markers", name="Poisson PMF"))
fig.add_trace(go.Scatter(x=ks, y=normal_approx, mode="lines", name="Normal approx"))
fig.update_layout(title=f"Normal approximation (mu={lam})", xaxis_title="k", yaxis_title="probability")
fig.show()
9) SciPy Integration#
SciPy provides a full-featured implementation via scipy.stats.poisson.
Common methods:
poisson.pmf(k, mu)/poisson.logpmf(k, mu)poisson.cdf(k, mu)/poisson.sf(k, mu)poisson.rvs(mu, size=..., random_state=...)scipy.stats.fit(poisson, data, bounds=..., method="mle")(generic fitting API for discrete/continuous distributions)
import scipy.stats as st
from scipy.stats import poisson
mu = 6.5
ks = np.arange(0, 25)
pmf_scipy = poisson.pmf(ks, mu)
cdf_scipy = poisson.cdf(ks, mu)
samples_scipy = poisson.rvs(mu, size=20_000, random_state=rng)
# For the canonical Poisson (loc=0), the MLE for mu is just the sample mean.
mu_mle = float(samples_scipy.mean())
# SciPy also provides a general-purpose fitter for (discrete or continuous) distributions.
# Because mu has domain [0, ∞), we supply a finite upper bound for numerical optimization.
fit_res = st.fit(
poisson,
samples_scipy,
bounds={"mu": (0.0, max(1.0, mu_mle * 10.0)), "loc": (0.0, 0.0)},
guess={"mu": mu_mle, "loc": 0.0},
method="mle",
)
{
"pmf_sum_over_range": float(pmf_scipy.sum()),
"cdf_last": float(cdf_scipy[-1]),
"sample_mean": float(samples_scipy.mean()),
"theory_mean": mu,
"mle_mu_hat_closed_form": mu_mle,
"fit_mu_hat_scipy": float(fit_res.params.mu),
"fit_loc_hat_scipy": float(fit_res.params.loc),
"fit_success": bool(fit_res.success),
}
{'pmf_sum_over_range': 0.9999999729280461,
'cdf_last': 0.9999999729280464,
'sample_mean': 6.5089,
'theory_mean': 6.5,
'mle_mu_hat_closed_form': 6.5089,
'fit_mu_hat_scipy': 6.508900029898882,
'fit_loc_hat_scipy': 0.0,
'fit_success': True}
10) Statistical Use Cases#
A) Hypothesis testing (rate / count)#
A common task: test whether an observed count is unusually high or low compared to a baseline rate.
If (X\sim\text{Poisson}(\lambda_0)) under the null, then an upper-tail p-value is [ \text{p-value} = \Pr(X \ge k_\text{obs} \mid \lambda_0) = 1 - F(k_\text{obs}-1; \lambda_0). ]
B) Bayesian modeling (Gamma–Poisson conjugacy)#
With a Gamma prior on (\lambda) and Poisson likelihood, the posterior is also Gamma. This is a workhorse model for counts.
C) Generative modeling#
Poisson counts appear in simulation pipelines (arrivals, defects, clicks). In conditional models, (\lambda) is linked to features via (\lambda_i = \exp(x_i^\top\beta)\times \text{exposure}_i).
# A) Hypothesis testing example: "are we seeing more events than usual?"
from scipy.stats import chi2
k_obs = 12
lambda0 = 5.0
p_upper = poisson.sf(k_obs - 1, lambda0) # P(X >= k_obs)
# Exact (central) confidence interval for lambda using chi-square quantiles
alpha = 0.05
if k_obs == 0:
ci_low = 0.0
else:
ci_low = 0.5 * chi2.ppf(alpha / 2, 2 * k_obs)
ci_high = 0.5 * chi2.ppf(1 - alpha / 2, 2 * (k_obs + 1))
{
"k_obs": k_obs,
"lambda0": lambda0,
"upper_tail_p_value": float(p_upper),
"95%_CI_for_lambda": (float(ci_low), float(ci_high)),
}
{'k_obs': 12,
'lambda0': 5.0,
'upper_tail_p_value': 0.0054530919130093445,
'95%_CI_for_lambda': (6.200575108722218, 20.96158504817696)}
# B) Bayesian modeling: Gamma prior + Poisson likelihood
from scipy.stats import gamma, nbinom
# Prior: lambda ~ Gamma(alpha0, rate=beta0)
alpha0, beta0 = 2.0, 1.0
# Data: n independent Poisson draws (unit exposure)
data = rng.poisson(lam=4.5, size=40)
alpha_post = alpha0 + data.sum()
beta_post = beta0 + len(data)
posterior_mean = alpha_post / beta_post
posterior_ci = gamma.ppf([0.025, 0.975], a=alpha_post, scale=1.0 / beta_post)
# Plot prior vs posterior
lam_grid = np.linspace(0, 12, 600)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=lam_grid,
y=gamma.pdf(lam_grid, a=alpha0, scale=1.0 / beta0),
mode="lines",
name=f"prior Gamma({alpha0},{beta0}) (rate)",
)
)
fig.add_trace(
go.Scatter(
x=lam_grid,
y=gamma.pdf(lam_grid, a=alpha_post, scale=1.0 / beta_post),
mode="lines",
name=f"posterior Gamma({alpha_post:.1f},{beta_post:.1f}) (rate)",
)
)
fig.update_layout(title="Gamma–Poisson conjugacy", xaxis_title="lambda", yaxis_title="density")
fig.show()
# Posterior predictive for a future count X_new (unit exposure): Negative Binomial
# If lambda ~ Gamma(alpha_post, rate=beta_post) and X|lambda ~ Poisson(lambda), then
# X_new ~ NegBin(n=alpha_post, p=beta_post/(beta_post+1)) in SciPy's parameterization.
n = alpha_post
p = beta_post / (beta_post + 1.0)
ks = np.arange(0, 30)
pred_pmf = nbinom.pmf(ks, n, p)
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pred_pmf, name="posterior predictive"))
fig.update_layout(title="Posterior predictive (Negative Binomial)", xaxis_title="k", yaxis_title="P(X_new=k)")
fig.show()
{
"prior_mean": alpha0 / beta0,
"posterior_mean": float(posterior_mean),
"posterior_95%_CI": (float(posterior_ci[0]), float(posterior_ci[1])),
}
{'prior_mean': 2.0,
'posterior_mean': 4.097560975609756,
'posterior_95%_CI': (3.5013649093750003, 4.7399398087126885)}
# C) Generative modeling example: Poisson regression-style simulation
n = 300
x = rng.normal(size=n)
exposure = rng.uniform(0.5, 2.0, size=n) # e.g. time at risk
beta0, beta1 = 1.0, 0.6
# log-link: lambda_i = exp(beta0 + beta1 * x_i) * exposure_i
lam = np.exp(beta0 + beta1 * x) * exposure
y = rng.poisson(lam=lam)
df = {
"x": x,
"exposure": exposure,
"lambda": lam,
"y": y,
}
fig = px.scatter(df, x="lambda", y="y", opacity=0.6)
fig.update_layout(title="Simulated counts vs true mean (lambda)", xaxis_title="lambda", yaxis_title="count y")
fig.show()
{
"mean_lambda": float(lam.mean()),
"mean_y": float(y.mean()),
"var_y": float(y.var(ddof=0)),
}
{'mean_lambda': 3.76247509267191,
'mean_y': 3.7933333333333334,
'var_y': 10.923955555555557}
11) Pitfalls#
Invalid parameters#
(\lambda < 0) is invalid.
(\lambda = 0) is valid but degenerate: (\Pr(X=0)=1).
Numerical issues#
Direct PMF computation can underflow/overflow for large (k) or (\lambda). Prefer
logpmfand stable special functions.Summing the PMF naively to get the CDF may lose precision in the tails. Prefer
scipy.stats.poisson.cdf/sf.
Modeling issues (the big ones)#
Over-dispersion: real count data often has variance (>) mean (heterogeneity, clustering). Consider Negative Binomial, quasi-Poisson, or hierarchical models.
Zero inflation: too many zeros vs Poisson; consider zero-inflated models.
Non-constant rate / dependence: if the rate changes over time or events cluster, the Poisson process assumptions fail.
Exposure matters: comparing counts without normalizing by exposure can be misleading.
12) Summary#
poisson(mu)is a discrete distribution on ({0,1,2,\dots}) with parameter (\mu=\lambda\ge 0).PMF: (\Pr(X=k)=e^{-\lambda}\lambda^k/k!); CDF can be written via incomplete gamma functions.
Mean = variance = (\lambda); skewness (=1/\sqrt{\lambda}); excess kurtosis (=1/\lambda).
The likelihood yields (\hat\lambda=\bar x) for i.i.d. samples.
Useful in hypothesis tests for rates, in Bayesian models via Gamma–Poisson conjugacy, and in generative simulations for count data.